Edler 1-4 



5 23, 24, 28, and 29 are proposed to be amended herein. The present amendment is accompanied by a 
petition fee for extension of time (one month). 

In the Office Action, the Examiner objected to the Abstract of the disclosure because the 
length is over 150 words and objected to the disclosure due to indicated informalities. The Examiner 
also objected to the specification as failing to provide proper antecedent basis for the claimed subject 

10 matter. The Examiner rejected claims 1, 6-9, 13 and 30-31 under 35 U.S.C. §103(a) as being 
unpatentable over Srinivasan et al. (IEEE Transaction on Signal Processing, vol. 46, April, 1998), in 
view of Johnston (United States Patent Number 5,481,614), rejected claims 2, 5, 10-12, 14 and 17-19 
under 35 U.S.C. § 103(a) as being unpatentable over Srinivasan et al. in view of Johnston, and further in 
view of admitted prior art, and rejected claims 3-4, 15-16, 20-29 and 32-33 under 35 U.S.C. § 103(a) as 

15 being unpatentable over Srinivasan et al. in view of Johnston, and further in view of well known prior 
art. 

Formal Objections 

The Examiner objected to the Abstract as being too long. The Abstract has been 
amended to ensure that it does not exceed 150 words. Thus, Applicants respectfully request that the 
20 objection to the Abstract under MPEP §608.01(b) be withdrawn. 

The disclosure was also objected to because of the following informality: the phrase 
"does need not need to be transmitted..." on page 6, line 8 requires appropriate correction. The 
disclosure has been amended to correct the indicated typographical error and Applicants respectfully 
request that the Examiner's objection be withdrawn. 
25 The Examiner also objected to the specification as failing to provide proper antecedent 

basis for the claimed subject matter. 

Regarding claims 5, 17, 23, and 28, the Examiner asserts the limitation of "the filter 
order" and "the intervals" is not clear, since applicant has failed to provide a particular order and 
particular intervals for the filter prior to the instant claims. The limitation of "the intervals" also lacks 
30 antecedent basis in the specification. 

The "order" and "interval" of filters are specifications or parameters associated with 
filters that are well understood by persons of ordinary skill in the art. Specific values for these 
parameters are beyond the scope of the invention. Regarding the lack of antecedent basis in the 
specification for the limitation of "the intervals," claims 5, 17, 23, and 28 have been amended to make 
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5 them more definite in compliance with Section 1 12. 

Regarding claim 7, the Examiner asserts that the limitation of "an image signal" and 
"visibility threshold" lacks antecedent basis in the specification. 

The specification has been amended to provide proper antecedent basis for the indicated 

limitations. 

10 Regarding claims 12 and 19, the Examiner asserts the limitation of "the coding stage for 

filter coefficients" has insufficient antecedent basis in the claim. 

Claims 12 and 19 have been amended to make them more definite in compliance with 

Section 112. 

Applicants believe that these amendments address the Examiner's concerns under 
15 Section 112, and respectfully request that the rejections under Section 112, second paragraph, be 
withdrawn. 

Independent Claims 1, 13, 20, 25 and 30-33 

Independent claims 1, 13 and 30-31 were rejected under 35 U.S.C. §103(a) as being 
unpatentable over Srinivasan et al. in view of Johnston and claims 20, 25, and 32-33 were rejected 

20 under 35 U.S.C. § 103(a) as being unpatentable over Srinivasan et al. in view of Johnston, and further in 
view of well known prior art. 

Regarding claim 1, the Examiner asserts that Srinivasan teaches an adaptive filter 
producing a filter output signal and having a magnitude response that approximates an inverse of the 
masked threshold. Applicants note that Srinivasan teaches the use of a filter bank, known in the art to 

25 be composed of filters with fixed (i.e. non-adaptive) impulse responses. See, Fig. 1. Srinivasan 
teaches to split the input spectrum into two or more bands. See, Fig. 2 and related text on page 1087. 
Therefore, the sub-band filters should have band pass characteristics. Srinivasan uses a cascaded 
structure of two-band filter banks, each splitting its input spectrum into two halves. The structure of the 
resulting filter bank, i.e. the number of cascades, is adaptive. This results in a variation of the structure 

30 as shown in Fig. 5. This variation, however, only affects the resulting frequency resolution, but not the 
overall magnitude responses. Independent claims 1, 13, 20, 25 and 30-33 require "said adaptive filter 
producing a filter output signal and having a magnitude response that approximates an inverse of the 
masked threshold" 
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5 



Thus, Srinivasan and Johnston, alone or in combination, do not disclose or suggest "said 



adaptive filter producing a filter output signal and having a magnitude response that approximates an 
inverse of the masked threshold," as required by independent claims 1, 13, 20, 25 and 30-33. 



Dependent claims 6-9 were rejected under 35 U.S.C. § 103(a) as being unpatentable over 
10 Srinivasan et al., in view of Johnston, claims 2, 5, 10-12, 14 and 17-19 were rejected under 35 U.S.C. 
§ 103(a) as being unpatentable over Srinivasan et al. in view of Johnston, and further in view of admitted 
prior art and claims 3-4, 15-16, 21-24, and 26-29 were rejected under 35 U.S.C. §103(a) as being 
unpatentable over Srinivasan et al. in view of Johnston, and further in view of well known prior art. 

Claims 2-12, 14-19, 21-24 and 26-29 are dependent on claims 1, 13, 20, and 25, 
15 respectively, and are therefore patentably distinguished over Srinivasan et al. and Johnston, and 
admitted and well known prior art, alone or in any combination, because of their dependency from 
amended independent claims 1 , 13, 20, and 25 for the reasons set forth above, as well as other elements 
these claims add in combination to their base claim. 

All of the pending claims, i.e., claims 1 through 33, are in condition for allowance and 
20 such favorable action is earnestly solicited. 

If any outstanding issues remain, or if the Examiner has any further suggestions for 
expediting allowance of this application, the Examiner is invited to contact the undersigned at the 
telephone number indicated below. 

The Examiner's attention to this matter is appreciated. 



Dependent Claims 2-12, 14-19, 21-24 and 26-29 



25 



Respectfully submitted, 




30 



Date: July 2, 2003 



Kevin M. Mason 
Attorney for Applicant(s) 
Reg. No. 36,597 
Ryan, Mason & Lewis, LLP 
1300 Post Road, Suite 205 
Fairfield, CT 06824 
(203) 255-6560 
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5 VERSION MARKED TO SHOW ALL CHANGES 

IN THE ABSTRACT : 

Please amend the Abstract as indicated below: 



10 A perceptual audio coder is disclosed for encoding audio signals, such as speech or 

music, with different spectral and temporal resolutions for redundancy reduction and irrelevancy 
reduction. The disclosed perceptual audio coder separates the psychoacoustic model (irrelevancy 
reduction) from the redundancy reduction, to the extent possible. The audio signal is initially spectrally 
shaped using a prefilter controlled by a psychoacoustic model. The prefilter output samples are 

15 thereafter quantized and coded to minimize the mean square error [(MSE)] across the spectrum. The 
disclosed perceptual audio coder can use fixed quantizer step-sizes, since spectral shaping is performed 
by the pre-filter prior to quantization and coding. The disclosed pre-filter and post-filter support the 
appropriate frequency dependent temporal and spectral resolution for irrelevancy reduction. A filter 
structure based on a frequency- warping technique is used that allows filter design based on a non-linear 

20 frequency scale. [The-eharacterreties-^ 
~generated-by4he-p^ 
predieti^e^eeffic^^ 
signaLX&ewiserthe^ 

filtex^_smg^ril^ 
25 ^representHfitm,^^^ 

IN THE CLAIMS: 

Please amend the claims as indicated below: 

30 1 . (Unamended) A method for encoding a signal, comprising the steps of: 

filtering said signal using an adaptive filter controlled by a psychoacoustic model, said 
adaptive filter producing a filter output signal and having a magnitude response that approximates an 
inverse of the masked threshold; and 

quantizing and encoding the filter output signal together with side information for filter 
35 adaptation control. 
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5 2. (Unamended) The method of claim 1 , wherein said quantizing and encoding step uses a 

transform or analysis filter bank suitable for redundancy reduction. 

3. (Unamended) The method of claim 1, further comprising the steps of quantizing and 
encoding spectral components obtained from a transform or analysis filter bank, and wherein said 

10 quantizing and encoding steps employ fixed quantizer step sizes. 

4. (Amended) The method of claim 1, wherein said quantizing and encoding step reduces 
the mean square error [(MSE)] in said signal 

15 5. (Amended) The method of claim 1, wherein a [the] filter order and [the] intervals of 

filter adaptation of said adaptive filter are selected suitable for irrelevancy reduction. 

6. (Unamended) The method of claim 1, wherein said signal is an audio signal. 

20 7. (Unamended) The method of claim 1, wherein said signal is an image signal and said 

adaptive filter is controlled in a way that said magnitude response approximates an inverse of a visibility 
threshold. 



8. (Unamended) The method of claim 1, further comprising the step of transmitting said 
25 encoded signal to a decoder. 

9. (Unamended) The method of claim 1, further comprising the step of recording said 
encoded signal on a storage medium. 

30 10. (Unamended) The method of claim 1 , wherein said encoding further comprises the step 

of employing an adaptive Huffman coding technique. 

1 1 . (Unamended) The method of claim 1 , wherein said filtering step is based on a frequency- 

warping technique using a non-linear frequency scale. 
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5 12. (Amended) The method of claim 1, wherein the [coding] encoding stage for filter 

coefficients comprises a conversion from linear-predictive coefficient [LPC] filter coefficients to lattice 
coefficients or to Line Spectrum Pairs. 

13. (Unamended) A method for encoding a signal, comprising the steps of: 

10 filtering said signal using an adaptive filter controlled by a psychoacoustic model, said 

adaptive filter producing a filter output signal and having a magnitude response that approximates an 
inverse of the masked threshold; and 

transforming the filter output signal using a plurality of subbands suitable for redundancy 

reduction; and 

15 quantizing and encoding the subband signals together with side information for filter 

adaptation control. 

14. (Unamended) The method of claim 1 3 , wherein said quantizing and encoding step uses a 
transform or analysis filter bank suitable for redundancy reduction. 

20 

15. (Unamended) The method of claim 13, further comprising the steps of quantizing and 
encoding spectral components obtained from a transform or analysis filter bank, and wherein said 
quantizing and encoding steps employ fixed quantizer step sizes. 

25 16. (Amended) The method of claim 13, wherein said quantizing and encoding step reduces 

the mean square error [(MSE)] in said signal. 

17. (Amended) The method of claim 13, wherein a [the] filter order and [the] intervals of 
filter adaptation of said adaptive filter are selected suitable for irrelevancy reduction. 

30 

18. (Unamended) The method of claim 13, wherein said filtering step is based on a 
frequency-warping technique using a non-linear frequency scale. 
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5 19. (Amended) The method of claim 13, wherein the [coding] encoding stage for filter 

coefficients comprises a conversion from linear-predictive coefficient [LPC] filter coefficients to lattice 
coefficients or to Line Spectrum Pairs. 

20. (Unamended) A method for decoding a signal, comprising the steps of: 
10 decoding and dequantizing said signal; 

decoding side information for filter adaptation control transmitted with said signal; and 
filtering the dequantized signal with an adaptive filter controlled by said decoded side 

information, said adaptive filter producing a filter output signal and having a magnitude response that 

approximates the masked threshold. 

15 

2 1 . (Unamended) The method of claim 20, wherein said decoding and dequantizing step uses 
an inverse transform or synthesis filter bank suitable for redundancy reduction. 

22. (Unamended) The method of claim 20, further comprising the steps of decoding and 
20 dequantizing spectral components obtained from a transform or synthesis filter bank, and wherein said 

decoding and dequantizing steps employ fixed quantizer step sizes. 

23. (Amended) The method of claim 20, wherein a [the] filter order and [the] intervals of 
filter adaptation of said adaptive filter are selected suitable for irrelevancy reduction. 

25 

24. (Amended) The method of claim 20, wherein the decoding stage for filter coefficients 
comprises a conversion from lattice coefficients or to Line Spectrum Pairs to linear-predictive 
coefficient [LPC] filter coefficients. 

30 25. (Unamended) A method for decoding a signal transmitted using a plurality of 

subband signals, comprising the steps of: 

decoding and dequantizing said transmitted subband signals; 

decoding side information for filter adaptation control transmitted with said signal; 

transforming said subbands to a filter input signal; and 
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* 5 filtering the filter input signal with an adaptive filter controlled by said decoded side 

information, said adaptive filter producing a filter output signal and having a magnitude response that 
approximates the masked threshold. 

26. (Unamended) The method of claim 25, wherein said decoding and dequantizing stepaises 
10 an inverse transform or synthesis filter bank suitable for redundancy reduction. 

27. (Unamended) The method of claim 25, further comprising the steps of decoding and 
dequantizing spectral components obtained from a transform or synthesis filter bank, and wherein said 
decoding and dequantizing steps employ fixed quantizer step sizes. 

15 

28. (Amended) The method of claim 25, wherein a [the] filter order and [the] intervals of 
filter adaptation of said adaptive filter are selected suitable for irrelevancy reduction. 

29. (Amended) The method of claim 25, wherein the decoding stage for filter coefficients 
20 comprises a conversion from lattice coefficients or to Line Spectrum Pairs to linear-predictive 

coefficient [LPC] filter coefficients. 

30. (Unamended) An encoder for encoding a signal, comprising: 

an adaptive filter controlled by a psychoacoustic model, said adaptive filter producing a 
25 filter output signal and having a magnitude response that approximates an inverse of the masked 
threshold; and 

a quantizer/encoder for quantizing and encoding the filter output signal together with 
side information for filter adaptation control. 

30 31. (Unamended) An encoder for encoding a signal, comprising: 

an adaptive filter controlled by a psychoacoustic model, said adaptive filter producing a 
filter output signal and having a magnitude response that approximates an inverse of the masked 
threshold; and 

a plurality of subbands suitable for redundancy reduction for transforming the filter 
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5 output signal; and 

a quantizer/encoder for quantizing and encoding the subband signals together with side 
information for filter adaptation control. 

32. (Unamended) A decoder for decoding a signal, comprising: 

10 a decoder/dequantizer for decoding and dequantizing said signal and decoding side 

information for filter adaptation control transmitted with said signal; and 

an adaptive filter controlled by said decoded side information, said adaptive filter 
producing a filter output signal and having a magnitude response that approximates the masked 
threshold. 

15 

33. (Unamended) A decoder for decoding a signal transmitted using a plurality of 
subband signals, comprising: 

a decoder/dequantizer for decoding and dequantizing said transmitted subband signals 
and decoding side information for filter adaptation control transmitted with said signal; 
20 means for transforming said subbands to a filter input signal; and 

an adaptive filter controlled by said decoded side information, said adaptive filter 
producing a filter output signal and having a magnitude response that approximates the masked 
threshold. 
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Marked Up Specification 

PERCEFftJAL CODING OF AUDIO SIGNALS USING SEPARATED 
IRRELEVANCY REDUCTION AND REDUNDANCY REDUCTION 

5 

Cross-Reference to Related Applications 

The present invention is related to United States Patent Application entitled 
"Method and Apparatus for Representing Masked Thresholds in a Perceptual Audio Coder," 
(Attorney Docket Number Edler 2-2-6), United States Patent Application entitled "Perceptual 

10 Coding of Audio Signals Using Cascaded Filterbanks for Performing Irrelevancy Reduction and 
Redundancy Reduction With Different Spectral/Temporal Resolution," (Attorney Docket 
Number Edler 3-4), United States Patent Application entitled "Method and Apparatus for 
Reducing Aliasing in Cascaded Filter Banks," (Attorney Docket Number Schuller 5) and United 
States Patent Application entitled "Method and Apparatus for Detecting Noise-Like Signal 

15 Components," (Attorney Docket Number Fink Faller 3), filed contemporaneously herewith, 
assigned to the assignee of the present invention and incorporated by reference herein. 



Field of the Invention 

The present invention relates generally to audio coding techniques, and more 
20 particularly, to perceptually-based coding of audio signals, such as speech and music signals. 

Background of the Invention 

Perceptual audio coders (PAC) attempt to minimize the bit rate requirements for 
the storage or transmission (or both) of digital audio data by the application of sophisticated 

25 hearing models and signal processing techniques. Perceptual audio coders [(PAC)] are 
described, for example, in D. Sinha et al., "The Perceptual Audio Coder," Digital Audio, Section 
42, 42-1 to 42-18, (CRC Press, 1998), incorporated by reference herein. In the absence of 
channel errors, a PAC is able to achieve near stereo compact disk (CD) audio quality at a rate of 
approximately 128 kbps. At a lower rate of 96 kbps, the resulting quality is still fairly close to 

30 that of compact disk [CD] audio for many important types of audio material. 



-1- 




Edler 1-4 



Perceptual audio coders reduce the amount of information needed to represent an 
audio signal by exploiting human perception and minimizing the perceived distortion for a given 
bit rate. Perceptual audio coders first apply a time-frequency transform, which provides a 
compact representation, followed by quantization of the spectral coefficients. FIG. 1 is a 
5 schematic block diagram of a conventional perceptual audio coder 100. As shown in FIG. 1, a 
typical perceptual audio coder 100 includes an analysis filterbank 110, a perceptual model 120, a 
quantization and coding block 130 and a bitstream encoder/multiplexer 140. 

The analysis filterbank 1 10 converts the input samples into a sub-sampled spectral 
representation. The perceptual model 120 estimates the masked threshold of the signal. For each 

10 spectral coefficient, the masked threshold gives the maximum coding error that can be introduced 
into the audio signal while still maintaining perceptually transparent signal quality. The 
quantization and coding block 130 quantizes and codes the prefilter output samples according to 
the precision corresponding to the masked threshold estimate. Thus, the quantization noise is 
hidden by the respective transmitted signal. Finally, the coded prefilter output samples and 

15 additional side information are packed into a bitstream and transmitted to the decoder by the 
bitstream encoder/multiplexer 140. 

FIG. 2 is a schematic block diagram of a conventional perceptual audio decoder 
200. As shown in FIG. 2, the perceptual audio decoder 200 includes a bitstream 
decoder/demultiplexer 210, a decoding and inverse quantization block 220 and a synthesis 

20 filterbank 230. The bitstream decoder/demultiplexer 210 parses and decodes the bitstream 
yielding the coded prefilter output samples and the side information. The decoding and inverse 
quantization block 220 performs the decoding and inverse quantization of the quantized prefilter 
output samples. The synthesis filterbank 230 transforms the prefilter output samples back into 
the timedomain. 

25 Generally, the amount of information needed to represent an audio signal is 

reduced using two well-known techniques, namely, irrelevancy reduction and redundancy 
removal. Irrelevancy reduction techniques attempt to remove those portions of the audio signal 
that would be, when decoded, perceptually irrelevant to a listener. This general concept is 
described, for example, in U.S. Pat. No. 5,341,457, entitled "Perceptual Coding of Audio 
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Signals," by J. L. Hall and J. D. Johnston, issued on Aug. 23, 1994, incorporated by reference 
herein. 

Currently, most audio transform coding schemes implemented by the analysis 
filterbank 1 10 to convert the input samples into a sub-sampled spectral representation employ a 

5 single spectral decomposition for both irrelevancy reduction and redundancy reduction. The 
redundancy reduction is obtained by dynamically controlling the quantizers in the quantization 
and coding block 130 for the individual spectral components according to perceptual criteria 
contained in the psychoacoustic model 120. This results in a temporally and spectrally shaped 
quantization error after the inverse transform at the receiver 200. As shown in FIGS. 1 and 2, the 

10 psychoacoustic model 120 controls the quantizers 130 for the spectral components and the 
corresponding dequantizer 220 in the decoder 200. Thus, the dynamic quantizer control 
information needs to be transmitted by the perceptual audio coder 100 as part of the side 
information, in addition to the quantized spectral components. 

The redundancy reduction is based on the decorrelating property of the transform. 

15 For audio signals with high temporal correlations, this property leads to a concentration of the 
signal energy in a relatively low number of spectral components, thereby reducing the amount of 
information to be transmitted. By applying appropriate coding techniques, such as adaptive 
Huffman coding, this leads to a very efficient signal representation. 

One problem encountered in audio transform coding schemes is the selection of 

20 the optimum transform length. The optimum transform length is directly related to the frequency 
resolution. For relatively stationary signals, a long transform with a high frequency resolution is 
desirable, thereby allowing for accurate shaping of the quantization error spectrum and providing 
a high redundancy. reduction. For transients in the audio signal, however, a shorter transform has 
advantages due to its higher temporal resolution. This is mainly necessary to avoid temporal 

25 spreading of quantization errors that may lead to echoes in the decoded signal. 

As shown in FIG. 1, however, conventional perceptual audio coders 100 typically 
use a single spectral decomposition for both irrelevancy reduction and redundancy reduction. 
Thus, the spectral/temporal resolution for the redundancy reduction and irrelevancy reduction 
must be the same. While high spectral resolution yields a high degree of redundancy reduction, 

30 the resulting long transform window size causes reverbation artifacts, impairing the irrelevancy 
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reduction. A need therefore exists for methods and apparatus for encoding audio signals that 
permit independent selection of spectral and temporal resolutions for the redundancy reduction 
and irrelevancy reduction. A further need exists for methods and apparatus for encoding speech 
as well as music signals using a psychoacoustic model (a noise-shaping filter) and a transform. 

5 

Summary of the Invention 

Generally, a perceptual audio coder is disclosed for encoding audio signals, such 
as speech or music, with different spectral and temporal resolutions for the redundancy reduction 
and irrelevancy reduction. The disclosed perceptual audio coder separates the psychoacoustic 
10 model (irrelevancy reduction) from the redundancy reduction, to the extent possible. The audio 
signal is initially spectrally shaped using a prefilter controlled by a psychoacoustic model. The 
prefilter output samples are thereafter quantized and coded to minimize the mean square error 
(MSE) across the spectrum. 

According to one aspect of the invention, the disclosed perceptual audio coder 
15 uses fixed quantizer step-sizes, since spectral shaping is performed by the pre-filter prior to 
quantization and coding. Thus, additional quantizer control information does not need to be 
transmitted to the decoder, thereby conserving transmitted bits. 

The disclosed pre-filter and corresponding post-filter in the perceptual audio 
decoder support the appropriate frequency dependent temporal and spectral resolution for 
20 irrelevancy reduction. A filter structure based on a frequency-warping technique is used that 
allows filter design based on a non-linear frequency scale. 

The characteristics of the pre-filter may be adapted to the masked thresholds (as 
generated by the psychoacoustic model), using techniques known from speech coding, where 
linear-predictive coefficient (LPC) filter parameters are used to model the spectral envelope of 
25 the speech signal. Likewise, the filter coefficients may be efficiently transmitted to the decoder 
for use by the post-filter using well-established techniques from speech coding, such as an LSP 
(line spectral pairs) representation, temporal interpolation, or vector quantization. 

A more complete understanding of the present invention, as well as further 
features and advantages of the present invention, will be obtained by reference to the following 
30 detailed description and drawings. 
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Brief Description of the Drawings 

FIG. 1 is a schematic block diagram of a conventional perceptual audio coder; 

FIG. 2 is a schematic block diagram of a conventional perceptual audio decoder 
5 corresponding to the perceptual audio coder of FIG. 1; 

FIG. 3 is a schematic block diagram of a perceptual audio coder according to the 
present invention and its corresponding perceptual audio decoder; 

FIG. 4. illustrates an Finite Impulse Response (F IR) predictor of order P, and the 
corresponding Infinite Impulse Response (H R) predictor; 
10 FIG. 5 illustrates a first order allpass filter; and 

FIG. 6 is a schematic diagram of an Finite Impulse Response [FIR] filter and a 
corresponding Infinite Impulse Response [IIR] filter exhibiting frequency warping in accordance 
with one embodiment of the present invention. 



15 Detailed Description 

FIG. 3 is a schematic block diagram of a perceptual audio coder 300 according to 
the present invention and its corresponding perceptual audio decoder 350, for communicating an 
audio signal, such as speech or music. While the present invention is illustrated using audio 
signals, it is noted that the present invention can be applied to the coding of other signals, such as 

20 the temporal, spectral, and spatial sensitivity of the human visual system, as would be apparent to 
a person of ordinary skill in the art, based on the disclosure herein. 

According to one feature of the present invention, the perceptual audio coder 300 
separates the psychoacoustic model (irrelevancy reduction) from the redundancy reduction, to the 
extent possible. Thus, the perceptual audio coder 300 initially performs a spectral shaping of the 

25 audio signal using a prefilter 310 controlled by a psychoacoustic model 315. For a detailed 
discussion of suitable psychoacoustic models, see, for example, D. Sinha et al., "The Perceptual 
Audio Coder," Digital Audio, Section 42, 42-1 to 42-18, (CRC Press, 1998), incorporated by 
reference above. Likewise, in the perceptual audio decoder 350, a post-filter 380 controlled by 
the psychoacoustic model 315 inverts the effect of the pre-filter 310. As shown in FIG. 3, the 
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filter control information needs to be transmitted in the side information, in addition to the 
quantized samples. 

Quantizer/Coder 

The prefilter output samples are quantized and coded at stage 320. As discussed 
5 further below, the redundancy reduction performed by the quantizer/coder 320 minimizes the 
mean square error [(MSE)] across the spectrum. 

Since the pre-filter 310 performs spectral shaping prior to quantization and 
coding, the quantizer/coder 320 can employ fixed quantizer step-sizes. Thus, additional 
quantizer control information, such as individual scale factors for different regions of the 
10 spectrum, does [need] not need to be transmitted to the perceptual audio decoder 350. 

Well-known coding techniques, such as adaptive Huffman coding, may be 
employed by the quantizer/coder stage 320. If a transform coding scheme is applied to the pre- 
filtered signal by the quantizer/coder 320, the spectral and temporal resolution can be fully 
optimized for achieving a maximum coding gain under a mean square error [(MSE)] criteria. As 
15 discussed below, the perceptual noise shaping is performed by the post-filter 380. Assuming the 
distortions introduced by the quantization are additive white noise, the temporal and spectral 
structure of the noise at the output of the decoder 350 is fully determined by the characteristics of 
the post-filter 380. It is noted that the quantizer/coder stage 320 can include a filterbank such as 
the analysis filterbank 110 shown in FIG. 1. Likewise, the decoder/dequantizer stage 360 can 
20 include a filterbank such as the synthesis filterbank 230 shown in FIG. 2. 

Pre-Filter/Post-Filter Based on Psychoacoustic Model 

One implementation of the pre-filter 310 and post-filter 380 is discussed further 
below in a section entitled "Structure of the Pre-Filter and Post-Filter." As discussed below, it is 
advantageous if the structure of the pre-filter 310 and post-filter 380 also supports the appropriate 
25 frequency dependent temporal and spectral resolution. Therefore, a filter structure based on a 
frequency-warping technique is used which allows filter design on a non-linear frequency scale. 

For using the frequency warping technique, the masked threshold needs to be 
transformed to an appropriate non-linear (i.e. warped) frequency scale as follows. Generally, the 
resulting procedure to obtain the filter coefficients g is: 
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- Application of the psychoacoustic model gives a masked threshold as power 

(density) over frequency. 

- A non-linear transformation of the frequency scale according to the frequency 
warping, as discussed below, gives a transformed masked threshold. 

5 - Application of linear-predictive coefficient [LPC] analysis / modeling techniques 

leads to linear-predictive coefficient [LPC] filter coefficients h, which can be quantized and 
coded using a transformation to lattice coefficients or line spectral pairs [L SPsI 

- for use in the warped filter structure shown in FIG. 6, the LPC filter coefficients, 
h, need to be converted to filter coefficients, g 

10 The characteristics of the filter 310 may be adapted to the masked thresholds (as 

generated by the psychoacoustic model 315), using techniques known from speech coding, where 
linear-predictive coefficient [(LPC)] filter parameters are used to model the spectral envelope of 
the speech signal. In conventional speech coding techniques, the linear-predictive coefficient 
[LPC] filter parameters are usually generated in a way that the spectral envelope of the analysis 

15 filter output signal is maximally flat. In other words, the magnitude response of the linear- 
predictive coefficient [LPC] analysis filter is an approximation of the inverse of the input spectral 
envelope. The original envelope of the input spectrum is reconstructed in the decoder by the 
linear-nredictive coefficient [LPC] synthesis filter. Therefore, its magnitude response has to be 
an approximation of the input spectral envelope. For a more detailed discussion of such 

20 conventional speech coding techniques, see, for example, W.B. Kleijn and K.K. Paliwal, "An 
Introduction to Speech Coding," in Speech Coding and Synthesis, Amsterdam: Elsevier (1995), 
incorporated by reference herein. 

In the case of an image signal, the adaptive filter is controlle d in a wav that the 
ma gnitude response approximates an inverse of a corresponding visi bility threshold, as would be 

25 a pparent to a person of ordinary skill in the art. 

Similarly, the magnitude responses of the psychoacoustic post-filter 380 and pre- 
filter 310 should correspond to the masked threshold and its inverse, respectively. Due to this 
similarity, known linear-predictive coefficient [LPC] analysis techniques can be applied, as 
modified herein. Specifically, the known linear-predictive coefficient [LPC] analysis techniques 

30 are modified such that the masked thresholds are used instead of short-term spectra. In addition, 
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for the pre-filter 310 and the post-filter 380, not only the shape of the spectral envelope has to be 
addressed, but the average level has to be included in the model as well. This can be achieved by 
a gain factor in the post-filter 380 that represents the average masked threshold level, and its 
inverse in the pre-filter 310. 
5 Likewise, the filter coefficients may be efficiently transmitted using well- 

established techniques from speech coding, such as an [LSP Qline spectral pairs[)] 
representation, temporal interpolation, or vector quantization. For a detailed discussion of such 
speech coding techniques, see, for example, F.K. Soong and B.-H. Juang, "Line Spectrum Pair 
[(LSP)] and Speech Data Compression," in Proc. ICASSP (1984), incorporated by reference 
10 herein. 

One important advantage of the pre-filter concept of the present invention over 
standard transform audio coding techniques is the greater flexibility in the temporal and spectral 
adaptation to the shape of the masked threshold. Therefore, the properties of the human auditory 
system should be taken into account in the selection of the filter structures. For a more detailed 

15 discussion of the characteristics of the masking effects, see, for example, M. R. Schroeder et al., 
"Optimizing Digital Speech Coders By Exploiting Masking Properties Of The Human Ear," 
Journal of the Acoust. Soc. Am., v. 66, 1647-1652 (Dec. 1979); and J. H. Hall, "Auditory 
Psychophysics For Coding Applications," The Digital Signal Processing Handbook (V. Madisetti 
and D. B. Williams, eds.), 39-1:39-22, CRC Press, IEEE Press (1998), each incorporated by 

20 reference herein. 

Generally, the temporal behavior is characterized by a relatively short rise time 
even starting before the onset of a masking tone (masker) and a longer decay after it is switched 
off. The actual extent of the masking effect also depends on the masker frequency leading to an 
increase of the temporal resolution with increasing frequency. 

25 For stationary single tone maskers, the spectral shape of the masked threshold is 

spread around the masker frequency with a larger extent towards higher frequencies than towards 
lower frequencies. Both of these slopes strongly depend on the masker frequency leading to a 
decrease of the frequency resolution with increasing masker frequency. However, on the non- 
linear "Bark scale," the shapes of the masked thresholds are almost frequency independent. This 

30 Bark scale covers the frequency range from zero (0) to 20 kHz with 24 units (Bark). 
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While these characteristics have to be approximated by the psychoacoustic model 
315, it is advantageous if the structure of the pre-filter 310 and post-filter 380 also supports the 
appropriate frequency dependent temporal and spectral resolution. Therefore, as previously 
indicated, the selected filter structure described below is based on a frequency-warping technique 
5 that allows filter design on a non-linear frequency scale. 

Structure of the Pre-Filter and Post-Filter 
The pre-filter 310 and post-filter 380 must model the shape of the masked 
threshold in the decoder 350 and its inverse in the encoder 300. The most common forms of 
predictors use a minimum phase finite-impulse response [(FIR)] filter in the encoder 300 leading 
10 to an infinite impulse response [IIR] filter in the decoder. FIG. 4. illustrates a[n] finite-impulse 
response [FIR] predictor 400 of order P, and the corresponding infinite impulse response [IIR] 
predictor 450. The structure shown in FIG. 4 can be made time-varying quite easily, since the 
actual coefficients in both filters are equal and therefore can be modified synchronously. 

For modeling masked thresholds, a representation with the capability to give more 
15 detail at lower frequencies is desirable. For achieving such an unequal resolution over frequency, 
a frequency-warping technique, described, for example, in H. W, Strube, "Linear Prediction on a 
Warped Frequency Scale," J. of the Acoust. Soc. Am., vol. 68, 1071-1076 (1980), incorporated 
by reference herein, can be applied effectively. This technique is very efficient in the sense of 
achievable approximation accuracy for a given filter order which is closely related to the required 
20 amount of side information for adaptation. 

Generally, the frequency-warping technique is based on a principle which is 
known in filter design from techniques like lowpass-lowpass transform and lowpass-bandpass 
transform. In a discrete time system an equivalent transformation can be implemented by 
replacing every delay unit by an all-pass. A frequency scale reflecting the non-linearity of the 
25 "critical band" scale would be the most appropriate. See, M. R. Schroeder et al., "Optimizing 
Digital Speech Coders By Exploiting Masking Properties Of The Human Ear," Journal of the 
Acoust. Soc. Am., v. 66, 1647-1652 (Dec. 1979); and U. K. Laine et al., "Warped Linear 
Prediction (WLP) in Speech and Audio Processing," in IEEE Int. Conf. Acoustics, Speech, 
Signal Processing, HI-349 - m-352 (1994), each incorporated by reference herein. 
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Generally, the use of a first order allpass filter 500, shown in FIG. 5, gives a 



sufficient approximation accuracy. However, the direct substitution of the first order allpass 
filter 500 into the finite impulse response [FIR] 400 of FIG. 4 is only possible for the pre-filter 
310. Since the first order allpass filter 500 has a direct path without delay from its input to the 
5 output, the substitution of the first order allpass filter 500 into the feedback structure of the 
infinite impulse response [IIR] 450 in FIG. 4 would result in a zero-lag loop. Therefore, a 
modification of the filter structure is required. In order to allow synchronous adaptation of the 
filter coefficients in the encoder and decoder, both systems should be modified as described 
hereinafter. 

10 In order to overcome this zero-lag problem, the delay units of the original 

structure (FIG. 4) are replaced by first order infinite impulse response [HR] filters containing 
only the feedback part of the first order allpass filter 500, as described in H.W. Strube, 
incorporated by reference above. FIG. 6 is a schematic diagram of an finite impulse response 
[FIR] filter 600 and an infinite impulse response [IIR] filter 650 exhibiting frequency warping in 

15 accordance with one embodiment of the present invention. The coefficients of the filter 600 need 
to be modified to obtain the same frequency as a structure with allpass units. The coefficients, g k 
(0 <k <P), are obtained from the original linear-predictive coefficient [LPC] filter coefficients 
with the following transformation: 



20 The use of a first order allpass in the finite impulse response [FIR] filter 600 leads to the 
following mapping of the frequency scale: 



da) 1 + a 2 -lacosco 

25 indicates whether the frequency response of the resulting filter 600 appears compressed (v > 1) or 
stretched (v < 1). The warping coefficient a should be selected depending on the sampling 




a sin to 



m = to + arctan 



1-acosa) 



The derivative of this function: 
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frequency. For example, at 32 kHz, a warping coefficient value around 0.5 is a good choice for 
the pre-filter application. 

It is noted that the pre-filter method of the present invention is also useful for 
audio file storage applications. In an audio file storage application, the output signal of the pre- 
filter 310 can be directly quantized using a fixed quantizer and the resulting integer values can be 
encoded using lossless coding techniques. These can consist of standard file compression 
techniques or techniques highly optimized for lossless coding of audio signals. This approach 
opens the applicability of techniques that, up to now, were only suitable for lossless compression 
towards perceptual audio coding. 

It is to be understood that the embodiments and variations shown and described 
herein are merely illustrative of the principles of this invention and that various modifications 
may be implemented by those skilled in the art without departing from the scope and spirit of the 
invention. 
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